AITopics | reference llm

Collaborating Authors

reference llm

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Transfer-Prompting: Enhancing Cross-Task Adaptation in Large Language Models via Dual-Stage Prompts Optimization

Chang, Yupeng, Chang, Yi, Wu, Yuan

arXiv.org Artificial IntelligenceFeb-19-2025

Large language models (LLMs) face significant challenges when balancing multiple high-level objectives, such as generating coherent, relevant, and high-quality responses while maintaining efficient task adaptation across diverse tasks. To address these challenges, we introduce Transfer-Prompting, a novel two-stage framework designed to enhance cross-task adaptation in prompt generation. The framework comprises two key components: (1) source prompt construction, which refines the original prompts on source task datasets to generate source prompts with enhanced generalization ability, and (2) target prompt generation, which enhances cross-task adaptation of target prompts by fine-tuning a set of high-scored source prompts on task-specific datasets. In each optimization cycle, a reference LLM generates candidate prompts based on historical prompt-score pairs and task descriptions in our designed reference prompt. These candidate prompts are refined iteratively, while a scorer LLM evaluates their effectiveness using the multi-dimensional metrics designed in the objective prompts evaluator-a novel contribution in this work that provides a holistic evaluation of prompt quality and task performance. This feedback loop facilitates continuous refinement, optimizing both prompt quality and task-specific outcomes. We validate Transfer-Prompting through extensive experiments across 25 LLMs, including 7 foundational models and 18 specialized models, evaluated on 9 diverse datasets. The results demonstrate that Transfer-Prompting significantly improves task-specific performance, highlighting its potential for enhancing cross-task adaptation in LLMs. The code is available at https://github.com/llm172/Transfer-Prompting.

dataset, llm, transfer-prompting, (15 more...)

arXiv.org Artificial Intelligence

2502.14211

Country:

Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Asia > China > Ningxia Hui Autonomous Region > Yinchuan (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Law (1.00)
Health & Medicine (1.00)
Education (1.00)
Banking & Finance (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.98)

Add feedback

Truthful Aggregation of LLMs with an Application to Online Advertising

Soumalias, Ermis, Curry, Michael J., Seuken, Sven

arXiv.org Artificial IntelligenceJun-26-2024

Online platforms generate hundreds of billions of dollars in revenue per year by showing advertisements alongside their own content. Currently, these platforms are integrating Large Language Models (LLMs) into their services. This makes revenue generation from LLM-generated content the next major challenge in online advertising. We consider a scenario where advertisers aim to influence the responses of an LLM to align with their interests, while platforms seek to maximize advertiser value and ensure user satisfaction. We introduce an auction mechanism for this problem that operates without LLM fine-tuning or access to model weights and provably converges to the output of the optimally fine-tuned LLM for the platform's objective as computational resources increase. Our mechanism ensures that truthful reporting is a dominant strategy for advertisers and it aligns each advertiser's utility with their contribution to social welfare - an essential feature for long-term viability. Additionally, it can incorporate contextual information about the advertisers, significantly accelerating convergence. Via experiments with a publicly available LLM, we show that our mechanism significantly boosts advertiser value and platform revenue, with low computational overhead. While our motivating application is online advertising, our mechanism can be applied in any setting with monetary transfers, making it a general-purpose solution for truthfully aggregating the preferences of self-interested agents over LLM-generated replies.

candidate sequence, mechanism, sequence, (15 more...)

arXiv.org Artificial Intelligence

2405.05905

Country:

Europe > Switzerland > Zürich > Zürich (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (1.00)

Industry:

Marketing (1.00)
Information Technology > Services (0.90)
Education > Educational Setting > Online (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

TRAP: Targeted Random Adversarial Prompt Honeypot for Black-Box Identification

Gubri, Martin, Ulmer, Dennis, Lee, Hwaran, Yun, Sangdoo, Oh, Seong Joon

arXiv.org Artificial IntelligenceJun-6-2024

Large Language Model (LLM) services and models often come with legal rules on who can use them and how they must use them. Assessing the compliance of the released LLMs is crucial, as these rules protect the interests of the LLM contributor and prevent misuse. In this context, we describe the novel fingerprinting problem of Black-box Identity Verification (BBIV). The goal is to determine whether a third-party application uses a certain LLM through its chat function. We propose a method called Targeted Random Adversarial Prompt (TRAP) that identifies the specific LLM in use. We repurpose adversarial suffixes, originally proposed for jailbreaking, to get a pre-defined answer from the target LLM, while other models give random answers. TRAP detects the target LLMs with over 95% true positive rate at under 0.2% false positive rate even after a single interaction. TRAP remains effective even if the LLM has minor changes that do not significantly alter the original function.

llm, suffix, system prompt, (16 more...)

arXiv.org Artificial Intelligence

2402.12991

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
North America > United States > Kentucky (0.04)
(5 more...)

Genre: Research Report (1.00)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Government > Tax (1.00)
Government > Regional Government > North America Government > United States Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

SMART: Automatically Scaling Down Language Models with Accuracy Guarantees for Reduced Processing Fees

Jo, Saehan, Trummer, Immanuel

arXiv.org Artificial IntelligenceMar-11-2024

The advancement of Large Language Models (LLMs) has significantly boosted performance in natural language processing (NLP) tasks. However, the deployment of high-performance LLMs incurs substantial costs, primarily due to the increased number of parameters aimed at enhancing model performance. This has made the use of state-of-the-art LLMs more expensive for end-users. AI service providers, such as OpenAI and Anthropic, often offer multiple versions of LLMs with varying prices and performance. However, end-users still face challenges in choosing the appropriate LLM for their tasks that balance result quality with cost. We introduce SMART, Scaling Models Adaptively for Reduced Token Fees, a novel LLM framework designed to minimize the inference costs of NLP tasks while ensuring sufficient result quality. It enables users to specify an accuracy constraint in terms of the equivalence of outputs to those of the most powerful LLM. SMART then generates results that deviate from the outputs of this LLM only with a probability below a user-defined threshold. SMART employs a profiling phase that evaluates the performance of multiple LLMs to identify those that meet the user-defined accuracy level. SMART optimizes the tradeoff between profiling overheads and the anticipated cost savings resulting from profiling. Moreover, our approach significantly reduces inference costs by strategically leveraging a mix of LLMs. Our experiments on three real-world datasets show that, based on OpenAI models, SMART achieves significant cost savings, up to 25.6x in comparison to GPT-4.

accuracy constraint, confidence level, llm, (14 more...)

arXiv.org Artificial Intelligence

2403.13835

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > New York > Tompkins County > Ithaca (0.04)
(7 more...)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.45)

Add feedback

Measuring and Reducing LLM Hallucination without Gold-Standard Answers via Expertise-Weighting

Wei, Jiaheng, Yao, Yuanshun, Ton, Jean-Francois, Guo, Hongyi, Estornell, Andrew, Liu, Yang

arXiv.org Artificial IntelligenceFeb-15-2024

LLM is known to provide factually inaccurate information that appears to be confident, i.e. hallucination. It is currently a major obstacle to the reliability and trustworthiness of LLM [13, 34, 21]. An essential step towards solving this problem is measuring hallucinations. However, this is challenging from a data perspective as existing metrics presume that benchmark datasets posses gold-standard answers, i.e. "best" or "correct" answers written by humans [16]. The requirement of such answers imposes two fundamental limitations on hallucination measurement: 1) hiring human annotators to produce gold-standard answers is costly in both time and money [4, 43, 38]; 2) gold-standard answers are prone to natural human errors [7, 6, 49]. To this end, we take a step forward and propose a framework which measures the LLM hallucinations without the requirement of gold-standard answers. Our framework is partially inspired by the literature on learning with noisy labels [23, 18, 19], where there are no ground-truth labels for verifying the quality of imperfect human annotations [43, 38, 20], detecting annotation errors [48, 26, 47], or training models robustly [42, 3, 17, 36, 39]. Our basic idea is simple: leveraging off-the-shelf and high-quality LLMs to generate answers that serve as a proxy for gold-standard answers. The primary challenge in such an approach is how to properly weigh the expertise of each LLM for a given question x, without a priori knowledge of the true (i.e.

fewl, gold-standard answer, reference llm, (11 more...)

arXiv.org Artificial Intelligence

2402.10412

Country:

North America > Canada > British Columbia (0.04)
Europe > France (0.04)
North America > United States > Illinois > Cook County > Des Plaines (0.04)
(5 more...)

Genre: Research Report (0.50)

Industry:

Leisure & Entertainment > Sports > Football (1.00)
Media > Television (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

Digger: Detecting Copyright Content Mis-usage in Large Language Model Training

Li, Haodong, Deng, Gelei, Liu, Yi, Wang, Kailong, Li, Yuekang, Zhang, Tianwei, Liu, Yang, Xu, Guoai, Xu, Guosheng, Wang, Haoyu

arXiv.org Artificial IntelligenceJan-1-2024

Pre-training, which utilizes extensive and varied datasets, is a critical factor in the success of Large Language Models (LLMs) across numerous applications. However, the detailed makeup of these datasets is often not disclosed, leading to concerns about data security and potential misuse. This is particularly relevant when copyrighted material, still under legal protection, is used inappropriately, either intentionally or unintentionally, infringing on the rights of the authors. In this paper, we introduce a detailed framework designed to detect and assess the presence of content from potentially copyrighted books within the training datasets of LLMs. This framework also provides a confidence estimation for the likelihood of each content sample's inclusion. To validate our approach, we conduct a series of simulated experiments, the results of which affirm the framework's effectiveness in identifying and addressing instances of content misuse in LLM training processes. Furthermore, we investigate the presence of recognizable quotes from famous literary works within these datasets. The outcomes of our study have significant implications for ensuring the ethical use of copyrighted materials in the development of LLMs, highlighting the need for more transparent and responsible data management practices in this field.

dataset, llm, target dataset, (15 more...)

arXiv.org Artificial Intelligence

2401.00676

Country:

Asia > Singapore (0.04)
Asia > China > Beijing > Beijing (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(4 more...)

Genre: Research Report > New Finding (0.93)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback